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Abstract 

A set X of vertices of an acyclic digraph D is convex if A 7^ and 
there is no directed path between vertices of A which contains a vertex 
not in A. A set A is connected if A 7^ and the underlying undirected 
graph of the subgraph of D induced by A is connected. Connected con- 
vex sets and convex sets of acyclic digraphs are of interest in the area 
of modern embedded processor technology. We construct an algorithm 
A for enumeration of all connected convex sets of an acyclic digraph 
D of order n. The time complexity of A is 0{n ■ cc{D)), where cc{D) 
is the number of connected convex sets in D. We also give an opti- 
mal algorithm for enumeration of all (not just connected) convex sets 
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of an acyclic digraph D of order n. In computational experiments we 
demonstrate that our algorithms outperform the best algorithms in the 
literature. 

Using the same approach as for A, we design an algorithm for 
generating all connected sets of a connected undirected graph G. The 
complexity of the algorithm is 0{n ■ c{G)), where n is the order of 
G and c{G) is the number of connected sets of G. The previously 
reported algorithm for connected set enumeration is of running time 
0{mn ■ c{G)), where m is the number of edges in G. 

1 Introduction 

A set X of vertices of an acyclic digraph D is convex if X 7^ and there 
is no directed path between vertices of X which contains a vertex not in 
X. A set X is connected if X 7^ and the underlying undirected graph of 
the subgraph of D induced by X is connected. A set is connected convex (a 
cc-set) if it is both connected and convex. 

In Section [3l we introduce and study an algorithm A for generating all 
connected convex sets of a connected acyclic digraph D of order n. The 
running time of A is 0{n ■ cc{D)), where cc{D) is the number of connected 
convex sets in D. Thus, the algorithm is (almost) optimal with respect to its 
time complexity. Interestingly, to generate only k cc-sets using A we need 
0{rfi + kn) time. In Section [Sj we give experimental results demonstrating 
that the algorithm is practical on reasonably large data dependency graphs 
for basic blocks generated from target code produced by Trimaran [22j and 
SimpleScalar [3]. Our experiments show that A is better than the state- 
of-the-art algorithm of Chen, Maskell and Sun [6]. Moreover, unlike the 
algorithm in [6], our algorithm has a provable (almost) optimal worst time 
complexity. 

Although such algorithms are of less importance in our application area be- 
cause of wider scheduling issues, there also exist algorithms that enumerate 
all of the convex sets of an acyclic graph. Until recently the algorithm of 
choice for this problem was that of Atasu, Pozzi and lenne [21 [19], however 
the CMS algorithm [6] (run in general mode) outperforms the API algorithm 
in most cases. In Section [4l we give a different algorithm, for enumeration of 
all the convex sets of an acyclic digraph, which significantly outperforms the 
CMS and API algorithms and which has a (optimal) runtime performance 
of the order of the sum of the sizes of the convex sets. 
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Avis and Fukuda [4] designed an algorithm for generating all connected 
sets in a connected graph G of order n and size m with time complexity 
0{mn ■ c{G)) and space complexity 0(n + m), where c(G) is the number of 
connected sets in G. Observe that when G is bipartite there is an orientation 
D oi G such that every connected set of G corresponds to a cc-set of D and 
vice versa. To obtain D orient every edge of G from X to Y , where X and 
Y are the partition classes of G. 

The algorithm of Avis and Fukuda is based on a so-called reverse search. 
Applying the approach used to design the algorithm A to connected set 
enumeration, in Section [U we describe an algorithm C for generating all 
connected sets in a connected graph G of order n with much better time 
complexity, 0{n-c{G)). This demonstrates that our approach can be applied 
with success to various vertex set /subgraph enumeration problems. The 
space complexity of our algorithm matches that of the algorithm of Avis 
and Fukuda. 

1.1 Algorithms Applications 

There is an immediate application for A in the field of so-called custom 
computing in which central processor architectures are parameterized for 
particular applications. 

An embedded or application specific computing system only ever executes 
a single application. Examples include automobile engine management sys- 
tems, satellite and aerospace control systems and the signal processing parts 
of mobile cellular phones. Significant improvements in the price-performance 
ratio of such systems can be achieved if the instruction set of the application 
specific processor is specifically tuned to the application. 

This approach has become practical because many modern integrated circuit 
implementations are based on Field Programmable Gate Arrays (FPGA). 
An FPGA comprises an array of logic elements and a programmable rout- 
ing system, which allows detailed design of logic interconnection to be per- 
formed directly by the customer, rather than a complete (and very high 
cost) custom integrated circuit having to be produced for each application. 
In extreme cases, the internal logic of the FPGA can even be modified whilst 
in operation. 

Suppliers of embedded processor architectures are now delivering extensible 
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versions of their general purpose processors. Examples include the ARM 
OptimoDE [U, the MIPS Pro Series [H] and the TensiUca Xtensa [21]. The 
intention is that these architectures be implemented either as traditional 
logic with an accompanying FPGA containing the hardware for extension 
instructions, or be completely implemented within a large FPGA. By this 
means, hardware development has achieved a new level of flexibility, but 
sophisticated design tools are required to exploit its potential. 

The goal of such tools is the identification of time critical or commonly 
occurring patterns of computation that could be directly implemented in 
custom hardware, giving both faster execution and reduced program size, 
because a sequence of base machine instructions is being replaced by a single 
custom extension instruction. For example, a program solving simultaneous 
linear equations may find it useful to have a single instruction to perform 
matrix inversion on a set of values held in registers. 

The approach proceeds by first locating the basic blocks of the program, 
regions of sequential computation with no control transfers into them. For 
each basic block we construct a data dependency graph (DDG) which con- 
tains vertices for each base (unextended) instruction in the block, along with 
a vertex for each initial input datum. Figure [T] shows an example of a DDG. 
There is an arc to the vertex for the instruction u from each vertex whose 
instruction computes an input operand of u. DDG's are acyclic because 
execution within a basic block is by definition sequential. 

Extension instructions are combinations of base machine instructions and 
are represented by sets of the DDG. In Figuredl sections A and B are convex 
sets that represent candidate extension instructions. However, Section B is 
not connected. If such a region were implemented as a single extension 
instruction we should have separate independent hardware units within the 
instruction. Although this presents no special difficulties, and in Section|4]we 
give an optimal algorithm for constructing all such sets, present engineering 
practice is to restrict the search to connected convex components on the 
grounds that unconnected convex components are composed of connected 
ones, and that the system's code scheduler will perform better if it is allowed 
to arrange the independent computations in different ways at different points 
in the program. 

Unlike connectivity however, convexity is not optional. An extension in- 
struction cannot perform computations that depend on instructions external 
to the extension instruction. This means that there can be no data flows 
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Figure 1: Data dependency graph for 
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out of and then back into the extension instruction: the set corresponding 
to an extension instruction must be convex. Thus section C in Figure [T] 
does not represent a candidate extension instruction since it breaches the 
'no external computation rule' because it is non-convex: there is a path via 
the SUB node that is not in the set. 

Ideally we would like to fully consider all possible candidate instructions and 
select the combination which results in the most efficient implementation. 
In practice this is unlikely to be feasible as, in worst case, the number of 
candidates will be exponential in the number of original program instruc- 
tions. However, it is useful to have a process which can find all the potential 
instructions, even if the set of instructions used for final consideration has to 
be restricted. In this work we only deal with generation of a set of possible 
candidate instructions. Interested readers can refer to 1191 



1.2 Related Theoretical Research 

Many other algorithms for special vertex set/subgraph generation have been 
studied in the literature. Kreher and Stinson |16j describe an algorithm for 
generating all cliques in a graph G of order n with running time 0{n-cl{G)), 
where cl{G) is the number of cliques in G. 
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Several algorithms have been suggested for the generation of all spanning 
trees in a connected graph G of order n and size m. Let t be the number 
of spanning trees in G. The first spanning trees generating algorithms |1H 
[T71 [20] used backtracking which is useful for enumerating various kinds of 
subgraphs such as paths and cycles. Using the algorithms from [171 EH]) 
Gabow and Myers [H] suggested an algorithm with time complexity 0{tn + 
n + m) and space complexity 0{n + m). If we output all spanning trees by 
their edges, this algorithm is optimal in terms of time and space complexities. 
Later algorithms of a different type were developed; these algorithms (see, 
e.g., [151 1231 [M] ) find a new spanning tree by exchanging a pair of edges. As 
a result, the algorithms of Kapoor and Ramesh [15] and Shioura and Tamura 
|23j require only 0{t + n + m) time and 0{nm) space. The algorithm of 
Shioura, Tamura and Uno [24] is of the same optimal running time, but also 
of optimal space: 0{n + m). 

An out-tree is an orientation of a tree such that all vertices but one are of 
in-degree 1. Kapoor, Kumar and Ramesh [TJ] presented an algorithm for 
enumerating all spanning out-trees of a digraph with n vertices, m arcs and 
t spanning out-trees. The algorithm takes O(logn) time per spanning tree; 
more precisely, it runs in O [t log n + ri^ a{n, n) -\-nm), where a is the Inverse 
Ackermann function. It first outputs a single spanning out-tree and then 
a list of arc swaps; each spanning out-tree can be generated from the first 
spanning out-tree by applying a prefix of this sequence of arc swaps. 

2 Terminology, Notation and Preliminaries 

Let D he a digraph. If xy is an arc of D {xy € A{D)), we say that y is an 
out-neighbor of x and x is an in-neighbor of y. The set of out-neighbors of x 
is denoted by N^{x) and the set of in-neighbors of x is denoted by N^{x). 
For a set X of vertices of D, its out-neighborhood (resp. in-neighborhood) is 
NMX) = U.ex ^d(^) \ ^ (resp. N-{X) = U.^x ^d(^) \ A digraph 
D^^ is called the transitive closure of D if y(Z)"^*-' ) = V{D) and a vertex x 
is an in-neighbor of a vertex y in D^'-'' if and only if there is a path from x 
to y in D. 

Let S be a non-empty set of vertices of a digraph D. A directed path P 
of D is an S-path if P has at least three vertices, its initial and terminal 
vertices are in S and the rest of the vertices are not in S. For a digraph 
D, CC{D) {CO{D)) denotes the collection of cc-sets (convex sets) in D; 
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cc{D) = \CC{D)\ and co(D) = |CC'(D)|. An ordering fi, f2; •••) I'm of vertices 
of an acyclic digraph D is called acyclic if for every arc ViVj of D we have 
i < j. 

Lemma 2.1. Let D be a connected acyclic digraph and let S be a vertex set 
in D. Then S is a cc-set in D if and only if it is a cc-set in D^^ . 



Proof. Let be a set of vertices of D. We will first prove that there is an 
iS-path in D if and only if there is an 5-path in D^^ . Since all arcs of D 
are in D^'^ , every 5-path in D is an 5-path in D^'-' . Let Q = xiX2 . . .Xqhe 
an 5-path in D^^ . Then there are paths P2,Pz, ■ ■ ■ iPq such that Q' = 
X1P2X2P3X3 . . . Xq-iPqXq is a path in D {Q' must be a path since D is 
acyclic). Since xi and Xq belong to S and X2 does not belong to S, there is 
a subpath of Q' which is an S'-path. 

If S is connected in D then it is clearly connected in D^^ , which implies 
that if S is a cc-set in D then it is a cc-set in D^*-" . Now let 5 be a cc-set 
in D^'-^ . Assume that D[S] is not connected and let x and y be vertices in 
different connected components in -D[S'], but which are connected by an arc 
in D^^. Without loss of generality xy is the arc in D^^ and Q is a path 
from X to y in D. However as S is convex all vertices in Q also belong to S 
and therefore x and y belong to the same connected component in a 
contradiction. □ 



It is well-known (see, e.g., the paper [9] by Fisher and Meyer, or [TO] by 
Furman) that the transitive closure problem and the matrix multiplication 
problem are closely related: there exists an 0(n" )-algorithm, with a > 2, 
to compute the transitive closure of a digraph of order n if and only if the 
product of two boolean nxn matrices can be computed in 0(n") time. Cop- 
persmith and Winograd [7J showed that there exists an 0(n^'^'^^)-algorithm 
for the matrix multiplication. Thus, we have the following: 

Theorem 2.2. The transitive closure of a digraph of order n can be found 
in 0(n^'^''^) time. 

We will need the following two results proved in |12j . 

Theorem 2.3. For every connected acyclic digraph D of order n, cc{D) > 
n{n+ l)/2. If an acyclic digraph D of order n has a Hamiltonian path, then 
cc{D) = n(n + l)/2. 
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Theorem 2.4. Let f{n) = 2"+n + l — c?„, where dn = 2 •2"'/^ for every even 

n and dn = 3 ■ 2*^"^^)/-^ for every odd n. For every connected acyclic digraph 
D of order n, cc[D) < f{n). Let Kp^q denote the digraph obtained from the 
complete bipartite graph Kp^q by orienting every edge from the partite set of 
cardinality p to the partite set of cardinality q. We have cc{Ka,n-a) = fi''^) 
provided \n — 2a\ < 1. 

3 Algorithm for Generating CC-Sets of an Acyclic 
Digraph 

In this section D denotes a connected acyclic digraph of order n and size m. 
Now we describe the main algorithm of this paper; we denote it by A. The 
input A is D and A outputs all cc-sets of D. The formal description of A 
is followed by an example and proofs of correctness of .4 and its complexity. 
Finally, we show that to produce k cc-sets A requires 0(n^-^^^ + kn) time. 
The algorithm works as follows. Given a digraph D on n vertices, it considers 
an acyclic ordering vi, . . . ,Vn of the transitive closure of D. For each vertex 
Vi we consider the sets X = {vi} and Y = {vi-^-l, . . . ,Vn} and call the 
subroutine B{X, Y, D) which finds all cc-sets S m. D such that X C S* C 
X [JY. At each step, if possible B{X, y, D) removes an element v from Y 
and adds it to X. If X has out-neighbors we choose v to be the 'largest' 
out-neighbor in the acyclic ordering(linc 3), otherwise if X has in-neighbors 
we choose v to be the 'smallest' in-neighbor (line 8). Then we find the other 
vertices required to maintain convexity (line 4 or line 9). If there are no in- 
or out-ncighbors we output X, otherwise we find find all the cc-sets such 
that X C 5" C X U y and v ^ S (line 12) and then all the cc-sets such that 
XCSCXuyandi;05 (line 13). 

Step 1: Find the transitive closure of D and set D = D^^ . 
Step 2: Find an acyclic ordering vi,V2, ■ ■ ■ ,Vn of D. 

Step 3: For each i = 1,2, ...,n do the following. Set X := {vi}, Y := 
{^^j+i, Vi+2, ■■■ ,Vn} and call B{X, Y, D). 

Step 4 subroutine B{X,Y,D): 

1. set A = iv+ (x)ny 
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2. if A 7^ { 

3. set V = Vj, where j =max{i : vi G A\ 

4. set R = {v}\J {N~Tc {v) ^ A) } 

5. else { 

6. set B = N-j-ciX) (^Y 

7. if S 7^ { 

8. set V = Vk, where A; =mm{i : vi G B} 

9. set R = {v}yj {N+rc {v) n B) } } 

10. if A = and B = { output X } 

11. else { 

12. B{XUR, Y\R, D) 

13. B{X, Y\{v}, D) } } 

Before proving the correctness of A, we consider an example. 

Example 3.1. Let D he the graph on the left below 





In Step 1, we find A{D^ ) = A{D) U {viVs,V2V5,viV5} (above right). Ob- 
serve that vi,V2,V3,V4,V5 is an acyclic ordering. We may assume that this 
is the ordering found in Step 2. 

For i = 1 in Step 3, we have X = {v\] and Y = {v2,vz,v/^,v^} = N~^{X), 
and we call B{{v{\ , {v2,v-i,V4^,v^} , D) . Then in Step 4, line 1, we compute 
A = {'U2, "Ws, ^^4, ^^5} and then, lines 3 and 4, obtain v = v^, N~j,(^{v) = 
{vi,V2, V3, ^4} and R = {v2,V3,V4,V5}. Then, at line 12, we make a recursive 
call to B{y{D), 0, D). In this call we have A = B = $ so, at line 10, the set 
V{D) = {vi, . . . ,v^} is output and the recursive call returns, to line 13 of 
^{{'Vi}i{v2,v^,V4^.,v^}.,D), where we make a call to B{{vi},{v2,v^,Vi\,D). 
We are now effectively looking at the graph Di below. 
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In Step 4, lines 1-4, we compute A = {^2,^3,^4} and obtain v = v^, 
^£,Tc{v) = {vi} and R = {V4}. At lines 12 and 13 we make recursive 
calls to V4}, {^2,^3}, D) and B{{v\\, {^2, 1's}; D) respectively. 

In the call to B{{vi,Vi},{v2,v^}.,D), lines 1-4, we obtain v = V3 and R = 
{v2,V3}. This in turn generates calls to B{{vi,V4,V2,V3},<l), D), which just 
outputs {vi,V2,V3,V4} and returns, and B{{vi,V4},{v2}, D). The latter call 
generates calls to B{{vi,V4,V2},^, D) and B{{vi,V4},$, D), which output 
{vi,V2,V4} and {vi,Vi}, respectively. 

In the call to B{{vi},{v2,V3},D), where we are effectively looking at D2 
above, we obtain v = V3 and R = {v2,vs}. This in turn generates calls to 
B{{vi,V2,V3},^, D), which just outputs {vi,V2,V3} and returns, and B{{vi}, {V2}, D) 
(graph above). The latter call generates calls to B{{vi,V2},^, D) and 
B{{vi},(}>, D), which output {vi,V2} and {vi}, respectively. This completes 
the case i = 1 in Step 3, and all the cc-sets containing vi have been output. 

Now we perform Step 3 with i = 2, effectively looking at the graph D4. 



The call to B{{v2},{v3,V4,V5},D) generates further recursive calls in the 
following order 

I3{{V2,V5,V3},{V4},D) 

B{{V2, V5,V3,V4}, 0, D), output {^2,^3, ^4, ^5} 

'^({^'2, V5,Vs}, 0, D), output {V2,V3, V5} 
B{{V2h{v3,V4},D) 

B{{V2,V3},{V4},D), output {V2,V3} 

B{{v2},{v4},D), output {V2}. 
Thus all the cc-sets containing V2 but not vi are output. 

Performing Step 3 again with i = ?>, effectively looking at the graph D5 
above, the call to B{{v3\, {U4, 1)5}, D), generates the following recursive calls 
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B{{v3,V5},{v4hD) 

S{{V3, V5, V4}, 0, D), output {V3, V4, V5} 
S{{V3,V5},^,D), output {V3,V5} 

-^({■"3}, output {V3} 

which ouput all the cc-sets containing V3 hut not v\ or V2 ■ 

For the case i = ^ in Step 3 we get the following calls 
B{{v^],{v^},D) 

B{{V4^,V^},%,D), output {V4,V5} 

B{{vi},^, D), output {V4} 
and for i = 5 we get 

B{{v5},$,D), output {V5} 
after which A terminates. 



Lemma 3.2. Algorithm A correctly outputs all cc-sets of D. 

Proof. Recall, the convex (connected) sets of D are precisely the convex 
(connected) sets of D^'-' . We prove the result for D^'-' . 

Firstly we show that all the sets X output by A are in CC{D^'~'). We will 
show that within A, for any call B{X, Y, D) we have that X^Y = X\JY 
is convex and X is a cc-set. This is clearly sufficient as X is the only set 
output. 

These properties hold for Step 3 when B{{vi}, {wj+i, . . . , Wn}, D) is called as 
we have chosen an acyclic ordering of the vertices. Thus we assume that 
the properties hold for the sets X, Y and consider the pairs of sets X \J R, 
Y\R and X, Y\{v} constructed in B{X,Y,D). In both cases clearly the 
intersections are empty, and since R C N^j.c{X) U N~j,fj{X), X (J R is 
connected. 

Now we will prove that X U i? is convex. Suppose that there is a path 
u, y, w where u,w ^ X\JR. Note that if there exists an (XUi2)-path then by 

transitivity of D"'"'^ there exists an (XU-R)-path of length two. By convexity 
of X U y we have y G X U y. Also, y ^ v as we have chosen v to be either 
the maximal element of N^j.q{X) or the minimal element of N^j.q{X), and 
D-^^ is transitive and thus the presence of the arcs uy and yw implies the 
presence of the arc uw. Assume that A ^ ^. Then R C N^tc{X). bmce 
u G XyjN'^j-u {X) and the arc uy exists, the transitivity of D^*^ implies that 
y G N^j,q{X). Since X is convex it follows that not both vertices u,w can 
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be in X and that there is no arc from N^j^q{X) to X. Thus w ^ X and so 
€ i? C Nj^j.(j{X). By the transitivity of D^'^ and the fact that yw exists 
and that w G N'^t^I^X) we have y G N~rpQ(X) and thus y & R. Similarly 
if ^ = then R C N~j,q{X) and by the transitivity of D^'^' and since 
w e XU N^TciX) we have li G and thus y G N~j,c{X) n N+D^'^{X). 

Secondly we show that if X 7^ is cc then X is output by A. If S" is a 
cc-set and j = min{z : Vi G S} then {vj} C 5 C {vj,Vj+i, . . . ,Vn}- Thus 
it is sufficient to show that if S is cc and XCSC.XLiY then B{X, Y, D) 
outputs S. We prove this by induction on 

If {N^T.c{X)<r\Y) = = {N:j^Tc{X)nY) then, since S is connected, S = X 
and B{X^Y^D) outputs X at line 10. This proves the result for \Y\ = 0, 
and for |y| > 1 we may assume that v G {N'^tc{X) U y U N~tc{X)). 

liv ^ S then we have X Q S Q {X U {Y\{v})) and \Y\{v]\ < \Y\, so by 
induction the call to B{X^Y\{v}^D) at line 13 outputs S. If r G 
we have arcs rv and xr, for some x G X C S. Thus, if u G S", by convexity 
of 5 we have i? C 5. Then, since < the caU to ^(X U i^, y\i?, 

at line 12 outputs S. □ 

Lemma 3.3. The running time of A is 0{n ■ cc{D)). 



Proof. Note that by Theorem 12.31 and the fact that D is connected we have 
n X cc{D) > n^{n + l)/2. Therefore the transitive closure of D can be 
found in 0{n - cc{D)) time, by Theorem l2.2[ It is well-known that an acyclic 
ordering can be found in time 0{n + m), see, e.g., [5], and clearly the sets 
^£)Tc{'^) ^^'^ ^£)Tc{'^) can be computed at the start of the algorithm in 
0{n) time, for each v G V{D). 

We will now show that B{X,Y,D) runs in time 0{\Y\ ■ cc'{X,Y) + Kx,y)^ 
where cc/ {X, Y) is the number of cc-sets S such that X S Q X UY and 
Kx,Y is the sum of the sizes of the sets S. Note that B returns at line 10 or 
makes two recursive calls to B (lines 12,13). If B returns at line 10 then we 
call this a leaf call otherwise the function call is an internal call. All function 
calls can be viewed as nodes of a binary tree (every node is a leaf or has two 
children) whose leaves and internal nodes correspond to calls to B. It is easy 
to see, by induction, that the number of internal nodes equals the number of 
leaves minus one. It is easy to see, by induction on the depth of the call tree, 
that B outputs each set S only once {B{XUR, Y\R, D) and B{X, Y\{v], D) 
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output those that contain v and do not contain v, respectively). Thus we 
have cd{X,Y) leaf calls and cc'{X,Y) — 1 internal calls. 



We assume that the set implementation allows us to find the size of a set 
and the largest and smallest elements of the set in unit time. Then the time 
taken by a call B{X, y, D) depends on the time taken to calculate the sets 
A, B and R. Since ^, i? C y, the time to compute R is at most 0(|y|). 
If we implement B{X,Y,D) n F so that N^tc{X) n Y and N~tc{X) are 
passed in as parameters then the time taken to calculate A and B is at most 
0{\Y\). By definition of R we have that A^+tc(^ \J R) = N^j-dX) - R 
and N~Tc{X U R) = N^tc{X) U N~tc{v) - R- X provided A ^ and 
N^^a iXUR)= N^^a (X) - R and N+^c {XUR)= N+^c (X) U N+^a (v) - 
R — X provided ^ = (and B ^ Since R ^ Y , these sets can be 
computed in 0([y[) time. 

If B{X,Y,D) calls B{X',Y',D) then \Y'\ < \Y\ thus a call to B at an 
internal node takes at most 0(|y|) time, and a call at a leaf node takes 
at most 0(|y| + 1^1) time, giving the desired total time bound of 0(|y| • 
cc'{X,Y)+Kx,y). 

We let Ki denote the sum of the sizes of all the cc-sets S such that f i G S* C 
{fj+i, . . . , u„}, and observe that Ki + . . . + Kn < n ■ cc{D). 

Finally, by Step 3, we conclude that the total running time is 

O ^cc'{{vi}, {vi+i,Vi+2, ■ ■ ■,Vn}) ■ in-i) + Ki\ = 0{cc{D) 



n 



^1=1 



□ 



Theorem 3.4. Algorithm A is correct and its time and space complexities 
are 0{n ■ cc{D)) and 0{n'^), respectively. 

Proof. The correctness and time complexity follows from the two lemmas 
above. The space complexity is dominated by the space complexity of Step 
1, 0(n2). □ 



Since cc{D) may well be exponential, we may wish to generate only a re- 
stricted number k of cc-sets. Theorem 13.51 can be viewed as a result in 
fixed-parameter algorithmics [8] with k being a parameter. 
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Theorem 3.5. To output k cc-sets the algorithm A requires 0(n^ '^''^ + fen) 
time. 

Proof. We may assume that k is at most the number of cc-sets containing 
vertex vi since otherwise the proof is analogous. 

We consider the binary tree T introduced in the proof of Lemma 13.31 and 
prove our claim by induction on k. It takes 0{n'^'^'^^) time to perform Steps 
1,2 and 3. It takes 0{n) internal nodes of T to reach the first leaf of T and, 
thus, for A; = 1 we obtain 0(n^'^''^ + n) time. Assume that k > 2. Let x 
be the first leaf of T reached by A, let y be the parent of x on T, let z be 
another child of y on T and let u be the parent of y. Observe that after 
deleting the nodes x and y and adding an edge between u and z, we obtain 
a new binary tree T' . By induction hypothesis, to reach the first k — 1 leaves 
in T' , we need 0(n^'^'^^ + {k — l)n) time. To reach the first k leaves in T, 
we need to reach x and the first k — 1 leaves in T' . Thus, we need to add to 
0^^2.376 _|_ _ I'jn) the time required to visit x and y only, which is 0(n). 
Thus, we have proved the desired bound ©(n^-^"^^ + kn). □ 



4 Generating Convex Sets in Acyclic Digraphs 

It is not hard to modify A such that the new algorithm will generate all 
convex sets of an acyclic digraph D in time 0{n ■ co{D)), where co{D) is the 
number of convex sets in D. However, a faster algorithm is possible and we 
present one in this section. 

To obtain all convex sets of D (and 0, which is not convex by definition), we 
call the following recursive procedure with the original digraph D and with 
-F = 0. This call yields an algorithm whose properties are studied below. 

A vertex x is a source (sink) if it has no in-neighbors (out-neighbors). In 
general, the procedure CS takes as input an acyclic digraph D = {V, A) and a 
set F <^V and outputs all convex sets of D which contain F. The procedure 
CS outputs V and then considers all sources and sinks of the graph that are 
not in F. For each such source or sink s, we call CS{D — s, F) and then add 
s to F. Thus, for each sink or source s £ V \ F we consider all sets that 
contain s and all sets that do not contain s. 

CS{D = {V,A),F) 
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1. output V 



2. for all seV\F with |iV+(s)| 

3. for all vertices v find Ni. 



s 



{v) and N^_^{v) 



or |iV"(s)| = do { 



4. 



5. 



call CS{D - s, F); set F := F U {s} 
for all vertices v find N^{v) and Nj^{v) 



} 



4.1 Correctness of the procedure 

Proposition 14.21 and Theorem 14.31 imply that the procedure CS is correct. 
We first show that all sets generated in line 1 are, in fact, convex sets. To 
this end, we use the following lemma. 

Lemma 4.1. Let D be an acyclic graph, let X be a convex set of D, and 
let s ^ X be a source or sink of D[X]. Then X \ {s} is a convex set of D. 

Proof. Suppose that X\{s} is not convex in D. Then there exist two vertices 
u,v G X\{s} and a directed path P from n to u which contains a vertex not 
in X \ {s}. Since X is convex, P only uses vertices of X and in particular 
s € P. Thus, there is a subpath u'sv' of P with u' , v' G X. But since s is a 
source or a sink in D[X] such a subpath cannot exist, a contradiction. □ 

Now we can prove the following proposition. 

Proposition 4.2. Let D = iy,A) be an acyclic digraph and let F <Z V . 

Then every set output by CS(D,F) is convex. 

Proof. We prove the result by induction on the number of vertices of the 
outputted set. The entire vertex set V is convex and is outputted by the 
procedure. Now assume all sets of size n — i > 2 that are outputted by 
the procedure are convex. We will show that all sets of size n — i — 1 that 
are outputted are also convex. When a set C is outputted the procedure 
CS{D[C],F') was called for some set F' C V. The only way CS{D[C],F') 
can be invoked is that there exist a set C C V and a source or sink c of -D[C"] 
with C = C'\{c}. Moreover C' will be outputted by the procedure and, thus, 
by our assumption is convex. The result now follows from Lemma |4.1[ □ 
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Theorem 4.3. Let D = {V,A) be an acyclic digraph and let F QV. Then 
every convex set of D containing F is outputted exactly once by CS{D,F). 

Proof. Let C be a convex set of D containing F. We first claim that there 
exist vertices ci,C2, ■ ■ ■ ,ct G V with V = {ci, C2, . . . , q}UC and is a source 
or sink of D[C U {cj, Cj+i, . . . , q}] for all i £ {1,2, ... , t}. To prove the claim 
we will show that for every convex set H with C C H V , there exists a 
source or sink s (z H \ C of the digraph D [H] . This will prove our claim as 
by Lemma [4.1l ff \ {s} is a convex set of D and we can repeatedly apply the 
claim. 

If there exists no arc from a vertex of C to a vertex D[H \ C] then any 
source of i7\C is a source oi D[H]. Note that D[H\C] is an acyclic digraph 
and, thus, has at least one source (and sink). Thus we may assume that 
there is an arc from a vertex m of C to a vertex v oi H \C. Consider a 
longest path v = viV2- ■ - Vr in D[H \ C] leaving v. Observe that Vr is a 
sink D[H \ C] and, moreover, there is no arc from to any vertex of C 
since otherwise there would be a directed path from u G C to a vertex in C 
containing vertices in H\C which is impossible as C is convex. Hence Vr is 
a sink of D[H] and the claim is shown. 

Next note that a sink or source remains a sink or source when vertices are 
deleted. Thus when CS{D, F) is executed and a source or sink s is consid- 
ered, then we distinguish the cases when s = oi for some i G {1, 2, . . . , t} or 
when this is not the case. If s = Cj and we currently consider the digraph D' 
and the fixed set F' , then we follow the execution path calling CS{D' — s, F'). 
Otherwise we follow the execution path that adds s to the fixed set. When 
the last Cj is deleted, we call CS{D[C], F") for some F" and the set C is 
outputed. It remains to show that there is a unique execution path yielding 
C. To see this, note that when we consider a source or sink s then either 
it is deleted of moved to the fixed set F. Thus every vertex is considered 
at most once and then deleted or fixed. Therefore each time we consider a 
source or sink there is a unique decision that finally yields C. □ 

4.2 Running time of CS 

We assume that the input acyclic digraph D = {V, A) is given by the two 
adjacency lists for each vertex, and the number of in-neighbors and out- 
neighbors is stored for each vertex. One can obtain this information at the 
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beginning in 0{n + m) time, where n (m) is the number of vertices (arcs) of 
the input connected acycUc digraph D. Observe that we output the vertex 
set of D as one convex set. Thus, it suffices to show that the running time of 
CS(D,F) without the recursive cahs is 0(|y|). This wih yield the running 
time Oii^ceCOiD) 1^1) by Theorem 

Since we have stored the number of in-neighbors and out-neighbors for every 
vertex v (^V, we can determine all sources and sinks in 0(|y|) time. For the 
recursive calls of CS we delete one vertex and have to update the number of 
in- respectively out-neighbors of all neighbors of the deleted vertex s. The 
vertex s has at most |y| — 1 neighbors and we can charge the cost of the 
updating information to the call of CS{D — s,F). Moreover we store the 
neighbours of s so that we can reintroduce them after the call of CS{D—s, F). 
Moving the sinks and sources to F needs constant time for each source or 
sink and thus we obtain 0(|y|) time in total. 

In summary we initially need 0{n + m) time, and then each call of CS{D, F) 
is charged with 0(|y|) before it is called and then additionally with 0(|y|) 
time during its execution. Since we output a convex set of size 0(|y|), the 
total running time is 0{n + m) + ^(^(^g^^^^,^ |C|). Since J2ceCO{D) \^\ ~ 
O(n^) by Theorem 12.31 t^is running time of CS is 0{J2ceC0{D) \^\)- 



5 Implementation and Experimental Results 

In order to test our algorithms A and CS for practicality we have imple- 
mented and run them on several instances of DDG's of basic blocks. We 
have compared our algorithm with the state-of-the-art algorithm of Chen, 
Maskell and Sun [6j (the CMS algorithm) using their own implementation, 
but with the code for I/O constraint checking removed so as to ensure that 
their algorithm was not disadvantaged. For completeness we have also com- 
pared CS to Atasu, Pozzi and lenne's algorithm [19] (the API06 algorithm). 
All the algorithms were coded in C-|— |- and all experiments were carried out 
on a 2 X Dual Core AMD Opteron 265 1.8GHz processor with 4 Gb RAM, 
running SUSE Linux 10.2 (64 bit). 

Our first set of tests is based on C and C-|— |- programs taken from the bench- 
mark suites of MiBench [13] and Trimaran [22]. We compiled these bench- 
marks for both the Trimaran (A,B,C,D,E) and SimpleScalar [3] (F,G,II,I) 
architectures. From here we examined the control-flow graph for each pro- 
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ID 


NV 


NA 


NS 


CMS (CT) 


A (CT) 


A 


35 


38 


139,190 


170 


96 


B 


42 


45 


4,484,110 


5,546 


3,246 


C 


26 


28 


5,891 


6 


4 


D 


39 


94 


3,968,036 


4,346 


2,710 


E 


45 


44 


1,466,961 


1,750 


1,156 


F 


24 


22 


46,694 


60 


30 


G 


20 


19 


397 








H 


20 


21 


1,916 








I 


43 


47 


10,329,762 


13,146 


7,210 



Table 1: cc-sets for benchmark programs 



gram to select a basic block within a critical loop of the program (often 
this block had been unrolled to some degree to increase the potential for 
efficiency improvements). 

We considered basic blocks, ranging from 20 to 45 lines of low level, inter- 
mediate, code, for which we generated the DDGs. We then selected, from 
these DDGs, the non-trivial connected components on which to run our 
algorithms. 

We give some preference to benchmarks which suite the intended applica- 
tion of the research taking our test cases from security applications including 
benchmarks for the Advanced Encryption Standard (B,C) and safety-critical 
software (A, E). We also include a basic example from the Trimaran bench- 
mark suite: Hyper (D), an algorithm that performs quick sort (F), part of 
a jpeg algorithm (G), and an example from the fft benchmark in mibench 
containing C source code for performing Discrete Fast Fourier Transforms 
(H). The final example is taken from the standard blowfish benchmark, an 
encryption algorithm. 

The results we have obtained are given in Table [H In the following tables 
NV denotes the number of vertices, NS denotes the number of generated 
sets, NA number of arcs, CT denotes clock time in 10"'^ CPU seconds, and 
for the benchmark data ID identifies the benchmark. 

For examples G and H both algorithms ran in almost time. For the other 
examples, the above results demonstrate that our algorithm A outperforms 
the CMS algorithm. 
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NV 


NA 


NS 


CMS (CT) 


A (CT) 


15 


56 


32,400 


30 


16 


16 


64 


65,041 


56 


23 


17 


72 


130,322 


114 


60 


18 


81 


261,139 


240 


113 


19 


90 


522,722 


540 


253 


20 


100 


1,046,549 


1,080 


513 


21 


110 


2,094,102 


2,166 


1,048 


22 


121 


4,190,231 


4,086 


2,156 



Table 2: cc-sets for graphs with maximum number of cc-sets 



ID 


NV 


NA 


NS 


API06 


CMS (CT) 


CS (CT) 


A 


35 


38 


1,123,851 


2,560 


1,390 


270 


C 


26 


28 


120,411 


250 


120 


40 


F 


24 


22 


3,782,820 


3,250 


3,630 


860 


G 


20 


19 


122,111 


70 


120 


30 


H 


20 


21 


55,083 


110 


110 


20 



Table 3: All convex sets for benchmark programs 



We also consider examples with worst-case numbers of cc-sets. Let, as in 
Theorem 12.41 Kp^q denote the digraph obtained from the complete bipartite 
graph Kp^g by orienting every edge from the partite set of cardinality p to 
the partite set of cardinality q. By Theorem 12.41 the digraphs Ka,n-a with 
|n— 2a| < 1 have the maximum possible number of cc-sets. Our experimental 
results for digraphs Ka,n-a with \n — 2a\ < 1 are given in Table EJ Again 
we see that A outperforms the CMS algorithm. 

We have compared algorithm CS with both CMS running in 'unconnected' 
mode and with API06. The examples used are the same as in Table 1, 
however we do not give results for examples B, D, E and I as these graphs 
produce an extremely large number of convex sets and as a result, do not 
terminate in reasonable time. The results are shown in Table [3l We can 
see that although CMS generally out-performs API06, there are two cases 
where API06 is marginally better. However, CS is consistantly three to five 
times faster than either of the other algorithms. 
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NV 


NA 


NS 


API06 


CMS (CT) 


CS (CT) 


15 


56 


32,768 


40 


40 


10 


16 


64 


65,536 


70 


70 


30 


17 


72 


131,072 


140 


130 


60 


18 


81 


261,144 


320 


320 


130 


19 


90 


524,288 


720 


700 


320 


20 


100 


1,046,575 


1,590 


1,500 


710 


21 


110 


2,097,152 


3,320 


3,010 


1,500 


22 


121 


4,194,304 


7,140 


6,310 


3,120 



Table 4: All convex sets for graphs with maximum number of cc-sets 

For interest we have also compared API06, CMS and CS on the digraphs 
that have maximal numbers of cc-sets. The results are shown in Table HI 
Again, while CMS and API06 are roughly comparable, CS is a least twice 
as fast as both of them. 

6 Connected Sets Generation Algorithm 

Let G be a connected (undirected) graph with vertex set V{G) = {vi,V2, . . . , Vn} 
and let G have m edges. For a vertex x G y{G) and a set X C V{G), let 
N{x) = {z e V{G) : xz G E{G)} and N{X) = [Jxex^i^) \ ^- The 
following is an algorithm, C, for generating all connected sets of G. 

Step 1: For each z = 1, 2, . . . , n do the following. Set X := {vi} and Y := 
{vi^i,Vi+2, ■ ■ ■ 1 1'n}- Initiate the set Nx as Nx ■= X{X) n Y. 

Step 2 (subroutine T>): Comment: T> finds all connected sets Q in D 
such that X C Q Q X UY. 

(2a): If Nx = then return the connected set X (and stop). 
(2b): If Nx / 0, then let v € Nx he arbitrary. 

(2c): Comment: In this step we will find all connected sets S such 
thatXU{v} C 5 C (Xuy). 

Set Nxfl '■= ^x, Xq := X and Yq := Y. Remove v from Y and 
Nx, and add it to X. For every u £Y\ Nx check whether u has 
an edge to v and if it does then add it to Nx- 
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Make a recursive call to subroutine V. Comment: we consider 
the new X and Y. 

Change Nx, X, and Y back to their original state by setting 
Nx ■■= Nx,o, X := Xo, and Y := Yq. 

(2d): Comment: In this step we will find all connected sets S such 
that X C S Q (X U Y) and v ^ S. Remove v from Y and remove 
V from Nx- 

Make a recursive call to subroutine D. 

Change Y back to its original state by adding v back to Y. Also 
change Nx back to its original state by adding v to it. 

Similarly to Theorem 13.41 one can prove the following: 

Theorem 6.1. Let c{G) be the number of connected sets of a connected 
graph G. Algorithm C is correct and its time and space complexities are 
0{n ■ c{G)) and 0{n + m), respectively. 

7 Discussions and Open Problems 

Our computational experiments show that A performs well and is of defi- 
nite practical interest. We have tried various heuristic approaches to speed 
up the algorithm in practice, but all approaches were beneficial for some 
instances and inferior to the original algorithm for some other instances. 
Moreover, no approach could significantly change the running time. The 
algorithm was developed independently from the CMS algorithm. However, 
the two algorithms are closely related, and work continues to isolate the 
implementation effects that give the performance differences. 
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